London Bikes data

We’ll use data from http://www.tfl.gov.uk to analyse usage of the London Bike Sharing scheme. This data has already been downloaded for you and exists in a CSV (Comma Separated Values) file that you have to read in to R.

There is no dropdown menu to read in your data to R, so instead we use functions like read_csv to load data from file into R objects.

#read the CSV file
bike <- read_csv(here::here("data", "london_bikes.csv")) %>% 
  mutate(weekend = if_else((wday == "Sat" | wday == "Sun"), TRUE, FALSE))

Cleaning our data

Sometimes our data needs a bit of ‘cleaning’. For instance, day_of_week is variable type character, or chr. We should, however, treat it as a categorical, or factor variable and relevel it, so Monday is the first level of the factor (or first day of week), etc.

R is fairly sensitive with dates. When you read a CSV file, the date may be in different formats. For instance, Christmas 2017 could be input as 12-25-2017, 25.12.2017, 25 Dec 2017, Dec 25, 2017, etc. To be consistent, we use lubridate’s ymd function, and force variable Day to be a date in the format YYYY-MM-DD

Finally, we can turn season from 1, 2, 3, 4, to words like Winter, Spring, etc.

We will be talking more about data wrangling later, but for now just execute the following piece of code.

# fix dates using lubridate, and generate new variables for year, month, month_name, day, and day_of _week
bike <- bike %>%   
  mutate(
    year=year(date),
    month = month(date),
    month_name=month(date, label = TRUE),
    day_of_week = wday(date, label = TRUE)) 

# generate new variable season_name to turn seasons from numbers to Winter, Spring, etc
bike <- bike %>%  
  mutate(
    season_name = case_when(
      month_name %in%  c("Dec", "Jan", "Feb")  ~ "Winter",
      month_name %in%  c("Mar", "Apr", "May")  ~ "Spring",
      month_name %in%  c("Jun", "Jul", "Aug")  ~ "Summer",
      month_name %in%  c("Sep", "Oct", "Nov")  ~ "Autumn",
    ),
    season_name = factor(season_name, 
                         levels = c("Winter", "Spring", "Summer", "Autumn")))
    
#examine the structure of the datafame
skim(bike)
Data summary
Name bike
Number of rows 4750
Number of columns 20
_______________________
Column type frequency:
character 1
factor 3
logical 1
numeric 14
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
wday 0 1 3 3 0 7 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
month_name 0 1 TRUE 12 Jul: 405, Jan: 403, Mar: 403, May: 403
day_of_week 0 1 TRUE 7 Sun: 679, Mon: 679, Fri: 679, Sat: 679
season_name 0 1 FALSE 4 Sum: 1198, Spr: 1196, Aut: 1183, Win: 1173

Variable type: logical

skim_variable n_missing complete_rate mean count
weekend 0 1 0.29 FAL: 3392, TRU: 1358

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
bikes_hired 0 1.00 26607.16 9744.66 0.0 19700.75 26349.5 33623.0 73094.0 ▂▇▆▁▁
year 0 1.00 2016.58 3.78 2010.0 2013.00 2017.0 2020.0 2023.0 ▆▇▆▇▇
month 0 1.00 6.52 3.45 1.0 4.00 7.0 10.0 12.0 ▇▅▅▅▇
week 0 1.00 26.58 15.05 1.0 14.00 27.0 40.0 53.0 ▇▇▇▇▇
cloud_cover 33 0.99 4.90 2.37 0.0 3.00 5.0 7.0 9.0 ▃▅▆▇▃
humidity 52 0.99 75.56 11.28 33.0 67.00 77.0 84.0 100.0 ▁▂▆▇▃
pressure 31 0.99 10154.60 104.04 9731.0 10093.00 10163.0 10225.0 10477.0 ▁▂▇▇▁
radiation 40 0.99 119.71 89.86 2.0 41.00 96.0 188.0 402.0 ▇▅▃▂▁
precipitation 31 0.99 17.11 37.78 0.0 0.00 0.0 16.0 516.0 ▇▁▁▁▁
snow_depth 302 0.94 0.02 0.24 0.0 0.00 0.0 0.0 7.0 ▇▁▁▁▁
sunshine 31 0.99 41.53 39.15 0.0 5.00 33.0 67.0 147.0 ▇▃▂▂▁
mean_temp 31 0.99 12.04 5.71 -4.1 7.70 11.9 16.6 30.9 ▁▇▇▅▁
min_temp 31 0.99 8.00 5.26 -9.4 4.10 8.2 12.1 22.3 ▁▃▇▇▁
max_temp 31 0.99 16.06 6.60 -1.2 11.00 15.7 21.0 40.2 ▂▇▇▂▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2010-07-30 2023-07-31 2017-01-28 12:00:00 4750

Summary Statistics

Besides the number of bikes hired each day, we also have data on the weather for that day

Having loaded and cleaned our data, we can create summary statistics using mosaic’s favstats. This is uni-variate analysis, so in our mosaic syntax we will use favstats(~ bikes_hired, data= bike). We also want to get an overall, time-series plot of bikes over time; for the latter, we just create a scatter plot of bikes_hired vs Day.

favstats(~ bikes_hired, data= bike)
minQ1medianQ3maxmeansdnmissing
01.97e+042.63e+043.36e+047.31e+042.66e+049.74e+0347500

While this analysis shows us overall statistics, what if we wanted to get summary statistics by year, day_of_week, month, or season? Mosaic’s syntax Y ~ X alows us to facet our analysis of a variable Y by another variable X using the syntax favstats( Y ~ Z, data=...)

favstats(bikes_hired ~ year, data=bike)
yearminQ1medianQ3maxmeansdnmissing
20102.76e+039.3e+03 1.4e+04 1.87e+042.8e+04 1.41e+045.62e+031550
20114.56e+031.63e+042.03e+042.37e+042.94e+041.96e+045.5e+03 3650
20123.53e+031.93e+042.62e+043.25e+044.71e+042.6e+04 9.43e+033660
20133.73e+031.76e+042.2e+04 2.74e+043.56e+042.2e+04 7.28e+033650
20144.33e+032.05e+042.77e+043.44e+044.9e+04 2.75e+049.07e+033650
20155.78e+032.21e+042.66e+043.29e+047.31e+042.7e+04 8.55e+033650
20164.89e+032.24e+042.79e+043.51e+044.66e+042.82e+048.85e+033660
20175.14e+032.41e+042.95e+043.45e+044.6e+04 2.86e+048.38e+033650
20185.86e+032.18e+042.92e+043.77e+044.61e+042.9e+04 1.02e+043650
20195.65e+032.42e+042.89e+043.45e+044.47e+042.86e+048.09e+033650
20204.87e+032.01e+042.75e+043.65e+047.02e+042.85e+041.16e+043660
20216.25e+032.17e+043.1e+04 3.82e+045.69e+043e+04       1.1e+04 3650
20220       2.51e+043.14e+044e+04       6.7e+04 3.15e+041.03e+043650
20239.26e+031.99e+042.33e+042.76e+043.53e+042.35e+045.64e+032120
favstats(bikes_hired ~ day_of_week, data=bike)
day_of_weekminQ1medianQ3maxmeansdnmissing
Sun0       1.41e+042.13e+043.04e+046.31e+042.26e+041.08e+046790
Mon3.97e+032.05e+042.59e+043.21e+046.7e+04 2.63e+048.53e+036790
Tue3.76e+032.25e+042.79e+043.49e+046.53e+042.84e+048.83e+036780
Wed4.33e+032.26e+042.8e+04 3.47e+045.44e+042.85e+048.79e+036780
Thu5.65e+032.26e+042.77e+043.52e+047.31e+042.85e+049.01e+036780
Fri5.4e+03 2.14e+042.69e+043.41e+046.7e+04 2.73e+048.77e+036790
Sat0       1.6e+04 2.28e+043.27e+047.02e+042.46e+041.14e+046790
favstats(bikes_hired ~ month_name, data=bike)
month_nameminQ1medianQ3maxmeansdnmissing
Jan3.73e+031.39e+041.9e+04 2.34e+043.8e+04 1.87e+046.09e+034030
Feb3.53e+031.6e+04 2.07e+042.46e+045.25e+042.04e+046.53e+033670
Mar5.06e+031.77e+042.34e+042.76e+045.66e+042.29e+047.74e+034030
Apr4.87e+032.11e+042.62e+043.13e+044.9e+04 2.62e+047.78e+033900
May1.2e+04 2.46e+043.04e+043.65e+047.02e+043.07e+048.56e+034030
Jun6.06e+032.79e+043.39e+043.96e+046.53e+043.37e+048.77e+033900
Jul5.56e+032.99e+043.69e+044.17e+047.31e+043.52e+048.37e+034050
Aug4.3e+03 2.7e+04 3.43e+043.93e+046.7e+04 3.21e+049.95e+034030
Sep0       2.5e+04 3.23e+043.67e+045.18e+043.08e+048.47e+033900
Oct7.07e+032.33e+042.84e+043.3e+04 4.79e+042.77e+046.99e+034030
Nov6.03e+031.87e+042.37e+042.81e+044.47e+042.33e+046.64e+033900
Dec2.76e+031.17e+041.67e+042.31e+043.91e+041.72e+047.03e+034030
favstats(bikes_hired ~ season_name, data=bike)
season_nameminQ1medianQ3maxmeansdnmissing
Winter2.76e+031.38e+041.89e+042.37e+045.25e+041.87e+046.68e+0311730
Spring4.87e+032.08e+042.65e+043.18e+047.02e+042.66e+048.66e+0311960
Summer4.3e+03 2.81e+043.48e+044e+04       7.31e+043.37e+049.13e+0311980
Autumn0       2.18e+042.77e+043.31e+045.18e+042.73e+048.01e+0311830

Exploratory Data Analysis

Time series plot of bikes rented

While summary statistics allow us to quickly disover key metrics that represent the data, they are often not sufficient by themselves. Often, it is useful to represent the data graphically (or visually) to uncover information that is critical for decision making, such as trends, patterns and outliers.

In this section we will create a time-series scatter-plot that shows the number of bikes hired on each day. We will use the ggplot2 library.

Creating a basic plot is a two-step process. The first step is the specify the data frame and the axes by using the syntax: ggplot(data frame, aes(x=variable, y=variable). The second step is to specify the plot that we want. In this case, we want a scatter (point) plot using geom_point() after a + sign.

ggplot(bike, aes(x=date, y=bikes_hired))+
  geom_smooth()+
  geom_point(alpha = 0.3)+
  theme_bw()+
  NULL
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Creating a Plot

If you were a consultant or an analyst, would you be comfortable showing the plot you created in the previous section to your client? Think about it. Probably not.

In this section, we will refine our plot to make it presentable (and to communicate the information more effectively).

Without a title, it becomes harder to comprehend the information presented in the plot. In addition, the labels of the X and Y axes are set to the respective variable (column) names by default, and are not formatted in the same way. This is unprofessional, and in some cases, confusing.

We start by using the labs() function to specify the title, the X-axis label and the Y-axis label. Notice that we are simply adding another layer to the previous plot by using the + symbol.

ggplot(bike, aes(x=date, y=bikes_hired))+
  geom_point(alpha = 0.3)+
  geom_smooth()+
  theme_bw()+
  labs(
    title    = "TfL Daily Bike Rentals",
    x        = "Day",
    y        = "Number of Bikes Hired"
  )+
  NULL
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

The plot looks much better, but we can further improve the look of the plot using the default themes in R. The graph below uses a default theme called theme_light().

ggplot(bike, aes(x=date, y=bikes_hired))+
  geom_point(alpha=0.5) +
  geom_smooth(color="red", size=1.5) +
  theme_bw()+
  labs(
    title    = "TfL Daily Bike Rentals",
    x        = "Day",
    y        = "Number of Bikes Hired"
  )+
  theme_light()+
  NULL
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Further graphs

# Histogram of bikes rented
ggplot(bike, aes(x=bikes_hired))+
  geom_histogram()+
  theme_bw()+
  NULL
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Histogram faceted by season_name
ggplot(bike, aes(x=bikes_hired))+
  geom_histogram()+
  facet_wrap(~season_name)+
  theme_bw()+
  NULL
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Histogram faceted by month_name
ggplot(bike, aes(x=bikes_hired))+
  geom_histogram()+
  facet_wrap(~month_name)+
  theme_bw()+
  NULL
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Histogram faceted by month_name in 4 rows
ggplot(bike, aes(x=bikes_hired))+
  geom_histogram()+
  facet_wrap(~month_name, nrow = 4)+
  theme_bw()+
  NULL
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Density plot 
ggplot(bike, aes(x=bikes_hired))+
  geom_density()+
  theme_bw()+
  NULL

# Density plot filled by season_name 
ggplot(bike, aes(x=bikes_hired))+
  geom_density(aes(fill=season_name), alpha = 0.3)+
  theme_bw()+
  NULL

# Density plot filled by season_name, and faceted by season_name
ggplot(bike, aes(x=bikes_hired))+
  geom_histogram(aes(fill=season_name), alpha = 0.5)+
  facet_wrap(~season_name, nrow = 4)+
  theme_minimal()+
  
  #remove legend to the right
  theme(legend.position = "none")+
  NULL
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Density plot filled by season_name, and faceted by month_name
ggplot(bike, aes(x=bikes_hired))+
  geom_density(aes(fill=season_name), alpha = 0.3)+
  facet_wrap(~month_name, nrow = 4)+
  theme_bw()+
  theme(legend.position="none")+
  NULL

#Boxplot of bikes_hired  by month
# since 'month' is a number, it treats it as a continuous variable; hence we get just one box
ggplot(bike, aes(x=month, y= bikes_hired))+
  geom_boxplot()+
  theme_bw()+
  NULL
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

#Boxplot  by month_name
ggplot(bike, aes(x=month_name, y= bikes_hired))+
  geom_boxplot()+
  theme_bw()+
  NULL

#Boxplot by month_name
ggplot(bike, aes(x=month_name, y= bikes_hired, fill=season_name))+
  geom_boxplot()+
  theme_bw()+
  NULL

#Violin plot by month_name
ggplot(bike, aes(x=month_name, y= bikes_hired))+
  geom_violin()+
  theme_bw()+
  NULL

# bikes_hired vs. weather features
ggplot(bike, aes(x=mean_temp, y= bikes_hired))+
  geom_point()+
  geom_smooth(method = "lm", se=FALSE)+
  theme_bw()+
  NULL
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 31 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 31 rows containing missing values (`geom_point()`).

ggplot(bike, aes(x=mean_temp, y= bikes_hired, colour=season_name))+
  geom_point(alpha = 0.2)+
  geom_smooth(method = "lm", se=FALSE)+
  theme_bw()+
  #  facet_wrap(~season_name, ncol=1)+
  NULL
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 31 rows containing non-finite values (`stat_smooth()`).
## Removed 31 rows containing missing values (`geom_point()`).

temperature_by_season <- ggplot(bike, aes(x=mean_temp, y= bikes_hired,colour=season_name)) +
  
  # rather than using geom_point(), we use geom_point_interactive()
  geom_point_interactive(aes( 
    tooltip = glue::glue("Mean Temp: {mean_temp}\nBikes Hired: {bikes_hired}\nDate: {date}")),
    alpha = 0.3) +
  geom_smooth_interactive(se = FALSE, method = lm)+
  theme_bw()+
  facet_wrap(~season_name, ncol=1)+
  #  facet_grid(season_name ~ weekend)+
  
  theme(legend.position = "none")+
  NULL

# you have created the ggplot object, you now pass it to
girafe(
  ggobj = temperature_by_season
)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 31 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 31 rows containing missing values
## (`geom_interactive_point()`).
###### bikes on humidity
ggplot(bike, aes(x=humidity, y= bikes_hired))+
  geom_point()+
  geom_smooth(method = "lm")+
  theme_bw()+
  NULL
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 52 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 52 rows containing missing values (`geom_point()`).

Model building

Correlation - Scatterplot matrix

Besides the number of bikes rented out on a given day, we have also downloaded weather data for London such as mean_temp, humidity, pressure, precipitation, etc. that measure weather conditions on a single day. It may be the case that more bikes are rented out when it’s warmer? Or how can we estimate the effect of rain on rentals?

Your task is to build a regression model that helps you explain the number of rentals per day.

Let us select a few of these numerical variables and create a scatterplot-correlation matrix

bike %>% 
  select(cloud_cover, humidity, pressure, radiation, precipitation, sunshine, mean_temp, bikes_hired) %>% 
  GGally::ggpairs()

Weekend or weekdays? Is there a difference?

t.test(bikes_hired ~ weekend, data= bike)
## 
##  Welch Two Sample t-test
## 
## data:  bikes_hired by weekend
## t = 12.511, df = 2068.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  3575.400 4904.607
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            27819.36            23579.36

Model 0: using the mean to predict bikes_hired

We start the naive model where we just use the average to predict how many bikes we are going to rent out on a single day

favstats(~bikes_hired, data = bike)
minQ1medianQ3maxmeansdnmissing
01.97e+042.63e+043.36e+047.31e+042.66e+049.74e+0347500
# can you create a confidence interval for mean bikes_hired? What is the SE?

model0 <- lm(bikes_hired ~ 1, data= bike)
msummary(model0)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  26607.2      141.4   188.2   <2e-16 ***
## 
## Residual standard error: 9745 on 4749 degrees of freedom

What is the regression’s residual standard error? What is the intercept standard error?

Model 1: bikes_hired on mean_temp

# Define the model
model1 <- lm(bikes_hired ~ mean_temp, data = bike)

# look at model estimated
msummary(model1)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13555.40     256.90   52.77   <2e-16 ***
## mean_temp    1084.61      19.29   56.24   <2e-16 ***
## 
## Residual standard error: 7558 on 4717 degrees of freedom
##   (31 observations deleted due to missingness)
## Multiple R-squared:  0.4014, Adjusted R-squared:  0.4012 
## F-statistic:  3163 on 1 and 4717 DF,  p-value: < 2.2e-16
# Diagnostics
autoplot(model1)

check_model(model1)

  • Is the effect of mean_temp significant? Why?
  • What proportion of the overall variability in bike rentals does temperature explain?

Model 2: bikes_hired on mean_temp plus weekend

# Define the model
model2 <- lm(bikes_hired ~ mean_temp + weekend, data = bike)

# look at model estimated
msummary(model2)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14761.58     257.98   57.22   <2e-16 ***
## mean_temp    1083.44      18.68   58.00   <2e-16 ***
## weekendTRUE -4173.41     235.89  -17.69   <2e-16 ***
## 
## Residual standard error: 7320 on 4716 degrees of freedom
##   (31 observations deleted due to missingness)
## Multiple R-squared:  0.4386, Adjusted R-squared:  0.4384 
## F-statistic:  1842 on 2 and 4716 DF,  p-value: < 2.2e-16
# Diagnostics
autoplot(model2)

check_model(model2)

  • Fit a regression model called model2 with the following explanatory variables: mean_temp and weekend

    • Is the effect of mean_temp significant? Why?
    • What proportion of the overall variability does this model explain? What is the meaning of the effect (slope) of weekendTRUE? What % of the variability of bikes_hired does your model explain?

Model 3: bikes_hired on mean_temp plus wday

# Define the model
model3 <- lm(bikes_hired ~ mean_temp + wday, data = bike)

# look at model estimated
msummary(model3)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14268.95     358.68  39.782  < 2e-16 ***
## mean_temp    1082.30      18.56  58.326  < 2e-16 ***
## wdayMon      -911.51     395.96  -2.302  0.02138 *  
## wdaySat     -2703.61     395.95  -6.828 9.69e-12 ***
## wdaySun     -4630.43     395.95 -11.694  < 2e-16 ***
## wdayThu      1124.91     395.95   2.841  0.00452 ** 
## wdayTue      1169.70     395.95   2.954  0.00315 ** 
## wdayWed      1149.84     395.95   2.904  0.00370 ** 
## 
## Residual standard error: 7271 on 4711 degrees of freedom
##   (31 observations deleted due to missingness)
## Multiple R-squared:  0.4466, Adjusted R-squared:  0.4458 
## F-statistic: 543.2 on 7 and 4711 DF,  p-value: < 2.2e-16
# Diagnostics
autoplot(model3)

check_model(model3)

What is the meaning of the effect (slope) of, e.g., wdayMon? What % of the variability of bikes_hired does your model explain?

Further variables/questions to explore on your own

Our dataset has many more variables, so here are some ideas on how you can extend your analysis

  • Are other weather variables useful in explaining bikes_hired?
  • We also have data on days of the week, month of the year, etc. Could those be helpful?
  • What’s the best model you can come up with?
  • Is this a regression model to predict or explain? If we use it to predict, what’s the Residual SE?

Ploting your best model predicitons vs reality

Let us say that your best model is model3. How do the predictions of your model compare to the actual data?

# plot actual data vs predicted data

my_best_model <- broom::augment(model3) %>% 
  mutate(day=row_number())



# Plot fitted line and residuals
ggplot(my_best_model, aes(x=day, y = bikes_hired)) +
  geom_point(alpha = 0.2) +
  geom_point(aes(y = .fitted), 
             shape = 1, 
             colour = "red", alpha = 0.2)+
  theme_bw()

Diagnostics, collinearity, summary tables

As you keep building your models, it makes sense to:

  1. Check the residuals, using performance::check_model(model_x). You will always have some deviation from normality, especially for very high values of n
  2. As you start building models with more explanatory variables, make sure you use car::vif(model_x) to calculate the Variance Inflation Factor (VIF) for your predictors and determine whether you have colinear variables. A general guideline is that a VIF larger than 10 is large, and your model may suffer from collinearity. Remove the variable in question and run your model again without it.
# Check VIF

vif(model2)
## mean_temp   weekend 
##  1.000012  1.000012
vif(model3)
##               GVIF Df GVIF^(1/(2*Df))
## mean_temp 1.000147  1        1.000073
## wday      1.000147  6        1.000012

Comparison of different models

Create a summary table, using huxtable (https://am01-sep23.netlify.app/example/modelling_side_by_side_tables/) that shows which models you worked on, which predictors are significant, the adjusted \(R^2\), and the Residual Standard Error. If you want to add more models, just make sure you do not forget the comma , after the last model, as shown below

# produce summary table comparing models using huxtable::huxreg()
huxreg(model0, model1, model2, model3, 
       statistics = c('#observations' = 'nobs', 
                      'R squared' = 'r.squared', 
                      'Adj. R Squared' = 'adj.r.squared', 
                      'Residual SE' = 'sigma'), 
       bold_signif = 0.05
) %>% 
  set_caption('Comparison of models')
Comparison of models
(1)(2)(3)(4)
(Intercept)26607.165 ***13555.399 ***14761.577 ***14268.948 ***
(141.390)   (256.902)   (257.977)   (358.679)   
mean_temp        1084.607 ***1083.442 ***1082.295 ***
        (19.287)   (18.679)   (18.556)   
weekendTRUE                -4173.405 ***        
                (235.893)           
wdayMon                        -911.506 *  
                        (395.959)   
wdaySat                        -2703.608 ***
                        (395.950)   
wdaySun                        -4630.427 ***
                        (395.955)   
wdayThu                        1124.915 ** 
                        (395.949)   
wdayTue                        1169.699 ** 
                        (395.953)   
wdayWed                        1149.836 ** 
                        (395.949)   
#observations4750        4719        4719        4719        
R squared0.000    0.401    0.439    0.447    
Adj. R Squared0.000    0.401    0.438    0.446    
Residual SE9744.660    7558.215    7320.001    7271.345    
*** p < 0.001; ** p < 0.01; * p < 0.05.